Managing Large Collections
نویسندگان
چکیده
Model building is a key objective of data mining and data analysis applications. In the past, such applications required only a few models built by a single data analyst. As more and more data has been collected and realworld problems have become more complex, it has become increasingly difficult for that data analyst to build all the required models and manage them manually. Building a system to help data analysts construct and manage large collections of models is a pressing issue. Consider a credit-card marketing application. The credit-card-issuing company wishes to build models describing the behavior of small segments of customers, or microsegments. Examples are middle-age customers with children in college living in zip code 10012 and graduate engineering students at university XYZ. A large credit-card company might have to deal with tens of thousands of such microsegments, each involving dozens of different models. Therefore, it may need to build and support hundreds of thousands of models. Similar problems also occur in personalization applications and e-commerce. The traditional approach is to aggregate the data into large segments, then use domain knowledge combined with “intelligent” model-building methods to produce a few good models. Intelligent means selecting the right functions and model types based on automatic algorithms and domainexpert knowledge. This approach reduces the number of models. However, it does not eliminate the need for Data analysts and naive users alike in information-intensive organizations need automated ways to build, analyze, and maintain very large collections of data mining models.
منابع مشابه
Managing Very Large Scienti c Data
We discuss issues in managing very large scientiic data collections and describe our approach at the San Diego Supercomputer Center for supporting high performance data-intensive applications. Our systems provide metadata-based access to data sets and support collections with widely varying data characteristics.
متن کاملManaging large collections of business process models - Current techniques and challenges
Nowadays, business process management is an important approach for managing organizations from an operational perspective. As a consequence, it is common to see organizations develop collections of hundreds or even thousands of business process models. Such large collections of process models bring new challenges and provide new opportunities, as the knowledge that they encapsulate requires to ...
متن کاملDatasets for the Grid
Introduction The grid provides a framework for managing and processing very large collections of data. Files are a very important unit for data handling but are not convenient for expressing a collective data view because large data collections must span a large number of files. The large data volume also makes it desirable to express some data collections as collections or subsets of existing ...
متن کاملA Framework for Business Process Model Repositories
Large organizations often run hundreds or even thousands of business processes. Managing such large collections of business processes is a challenging task. Intelligent software can assist in that task by providing common repository functions such as storage, search and version management. They can also provide advanced functions that are specific for managing collections of process models, suc...
متن کاملManaging Very Large Databases and Data Warehousing
Major libraries have large collections and circulation. Managing libraries electronically has resulted in the creation and management of large library databases. The interconnection of libraries and sharing resources across libraries has resulted in the management of very large databases. Most large and/or multinational industries worldwide have exploited such opportunities by applying data war...
متن کاملGERINDO: Managing and Retrieving Information in Large Document Collections
We present in this report a summary of the main results produced in the five years of the GERINDO research project. The aim of this project is to address the increasing demand for software tools capable of dealing with information available in large document collections, such as the World Wide Web. It involves efforts of researchers from three Brazilian universities to develop core technologies...
متن کامل